Introduction to GIS and mapping in R

Mapping raster and vector data

Author

Brittany Barker

Published

June 14, 2024

Load packages

Overview

Combine the terra, sf, and ggplot2 packages to work with geospatial data and create beautiful maps.

Learning objectives

  • Plot and analyze spatial data to address a research question
  • Apply different mapping functions to plot spatial data
  • Describe the difference between vector and raster features
  • Adjust ggplot2 settings to customize maps

Mapping predicted vs. observed dates of lilac phenophases

In the following exercises, we will visualize spatial concordance between model-predicted and observed dates of lilac leaf out, which is a phenophase of lilac in which new leaves appear. A phenophase is an observable stage of an animal’s or plant’s life cycle that can be defined by a start and end point. All data were derived from the USA National Phenology Network, an organization that hosts phenology models and maintains a database of volunteer-contributed observations of phenology for ecologically and economically important species of plants and insects in the USA.

Exercise 1: Mapping predicted lilac leaf out

We import raster data for model predictions of lilac leaf out for 2018 using the rast() function in terra.

lilacModel_r <- rast("../data/lilac_2018_model.tiff")

Check out attributes of the raster, such as it’s class, coordinate reference system (crs), class, extent, and range of values correspond to the day in which lilac leaf out was predicted to occur in 2018.

class(lilacModel_r)
[1] "SpatRaster"
attr(,"package")
[1] "terra"
lilacModel_r
class       : SpatRaster 
dimensions  : 1228, 2606, 1  (nrow, ncol, nlyr)
resolution  : 0.02246419, 0.02107085  (x, y)
extent      : -125.0208, -66.47917, 24.0625, 49.9375  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 (EPSG:4269) 
source      : lilac_2018_model.tiff 
name        : layer 
min value   :     6 
max value   :   223 

We’ll go over three approaches for mapping this raster.

(1a) terra: plot()

The raster can be visualized using the plot() function in terra. This function can produce some decent looking maps, but not as nice as ggplot2 maps (in my opinion). Try it out below.

# Plot the raster
plot(lilacModel_r)

The color palette can changed using the col option. The default is rev(grDevices::terrain.colors(50)). Can you find a palette that you like better? (e.g., check out other palettes in viridis by typing viridis::).

# Continuous prediction using a different color palette
plot(lilacModel_r, col = rev(viridis::viridis(50)))

Raster data can be depicted in bins by specifying “interval” for the type option. The number of intervals is defined with the breaks option. Apply different numbers of breaks to see how it affects your map.

# Binned predictions 
plot(lilacModel_r, type="interval", breaks = 8, col = rev(viridis::viridis(8)))

# The legend falls off the map!

You can also classify the bins yourself using the classify() function. Try changing the classes object in the code below to see how it impacts the map.

# How should you define your bins?
classes <- seq(0, 230, 20)
# Now classify the raster
lilacModel_r2 <- classify(lilacModel_r, classes)
# Plot the results as before, except add `type="classes"` to the function
plot(lilacModel_r2, type="classes", col = rev(viridis::viridis(length(classes))))


Question 1: How does predicted lilac leaf out vary across the contiguous U.S.? Any ideas what may explain these patterns?

Answer: Leaf out occurs earlier in southern latitudes because warmer temperatures there drive faster development.


The plot() function has several other options for customizing maps. Below, I added a title, put axis labels on 3 sides.

plot(lilacModel_r, 
     type="interval", 
     breaks = 8, 
     col = rev(viridis::viridis(8)),
     plg=list(title="Predicted\nLeafout DOY", cex=0.9), pax=list(side=1:3))

(1b) Plotting rasters in ggplot2

Most people are more familiar with tidyverse packages and functions than those in terra or its predecessor raster. Thus, mapping rasters with ggplot2 is often preferred over plot(). We’ll cover two ggplot2 functions: (1) geom_raster(), which requires data to be in data frame format, and (2) geom_spatraster(), which can work directly with SpatRaster objects. The latter function is part of the relatively new tidyterra package. For both examples, days of the year will be binned into months to facilitate comparisons with observed lilac leaf out dates (Exercise 2, below). This task requires defining a factor variable, as described below.

ggplot2: geom_raster()

First, convert the raster into a data frame using as.data.frame() in terra. Set the xy option to TRUE to retain coordinate information (x and y columns), and rename the data column from layer to leaf_out_doy.

# Convert raster data to a data frame 
lilacModel_df <- lilacModel_r %>%
  as.data.frame(xy = TRUE) %>%
  rename("leaf_out_doy" = layer)
head(lilacModel_df)
              x        y leaf_out_doy
71692 -95.15470 49.35805          121
71693 -95.13223 49.35805          121
71694 -95.10977 49.35805          121
71695 -95.08731 49.35805          121
71696 -95.06484 49.35805          125
71697 -95.04238 49.35805          125

Next, we create a data frame that specifies the month for each day of year, which will be used for defining factor levels and for plotting in ggplot2.

# Data frame needed to create pretty plots
catgories_df <- data.frame(
  # Day and month of year
  leaf_out_doy = 1:365, 
  leaf_out_month = cut_interval(1:365, 12)) %>%
  # Bin dates by month and re-format labels to remove brackets, parentheses, etc.
  mutate(leaf_out_month = format(as.Date(
    leaf_out_doy, origin = "2018-01-01"), "%b"),
    leaf_out_month = gsub("\\(|\\]|\\[", "", leaf_out_month)) %>%
  mutate(leaf_out_month = gsub(",", "-", leaf_out_month))

# Convert month to a factor so they're in the right order on plots
# Factor levels are ordered by day of the year
catgories_df$leaf_out_month <- factor(
  catgories_df$leaf_out_month, 
  levels = unique(catgories_df$leaf_out_month[order(catgories_df$leaf_out_doy)]))

Join the data frame with factor levels to lilacModel_df and examine its structure.

# Join month labels to lilac predictions
lilacModel_df <- left_join(lilacModel_df, catgories_df, by = "leaf_out_doy") 
head(lilacModel_df)
          x        y leaf_out_doy leaf_out_month
1 -95.15470 49.35805          121            May
2 -95.13223 49.35805          121            May
3 -95.10977 49.35805          121            May
4 -95.08731 49.35805          121            May
5 -95.06484 49.35805          125            May
6 -95.04238 49.35805          125            May

Next, we create some custom labels for the plots, and define a pretty color palette.

# Create a vector of labels to show in plot legend
labs <- unique(catgories_df$leaf_out_month)

# Create a color palette (vector) for the plot
# Classic cyclic palette in "ggthemes" package
pal <- ggthemes_data[["tableau"]][["color-palettes"]][["regular"]]$`Classic Cyclic`
pal <- pal$value # Vector of hex codes for colors
pal <- pal[1:12] # Keep only 12 colors
names(pal) <- unique(catgories_df$leaf_out_month) # Name colors by month

Finally, the lilacModel_df data frame can be plotted using geom_raster().

# Create a map using ggplot2
ggplot() + 
  geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) 

This map looks okay but can definitely be improved. For example, we could add a plot title, custom legend title, and maybe get rid of the default gray background, grid lines, and axes labels. First add the titles.

# Create a map using ggplot2
ggplot() + 
  geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
  scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal) +
  ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") 

Now customize the map theme. If you’re producing multiple plots, it’s more concise to define your custom theme a single time (DRY principle = Don’t Repeat Yourself).

# Custom theme
my_theme <- theme(#panel.grid= element_blank(),
                 panel.background = element_blank(), 
                 panel.border = element_rect(fill = NA),
                 axis.title = element_blank(), 
                 #axis.text = element_blank(),
                 #axis.ticks = element_blank(),
                 plot.title = element_text(face = "bold", size = 16),
                 legend.key = element_rect(fill = "white"),
                 legend.title = element_text(face="bold", size = 12),
                 legend.text = element_text(size = 11),
                 legend.key.height = unit(0.0025, "cm"),
                 legend.key.width = unit(1, "cm"))

Apply the custom theme to the plot.

# Create a map using ggplot2
ggplot() + 
  geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
  scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal) +  
  ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") +
  my_theme # Our custom theme


Challenge 1: Change at least two plot features to align with your preferences. For example, you could remove the plot axes (lat and long), change the legend attributes, font types and sizes, etc. (my_theme), or change the color palette (pal).


FYI: The ggsn package has functions to add a scale bar and north arrow. Check it out sometime!

Adding boundary data to maps

The map is more informative if we add state boundaries. Some R packages provide these data, including rnaturalearth and USAboundaries ; however, installing them can sometimes be problematic. To avoid troubleshooting, I saved USA boundary data as shapefiles. First, we need to use st_use_s2(FALSE) because sf assumes the world is round, and uses great circles for straight lines. This will cause issues for cropping boundaries to the same extent as our raster data.

sf_use_s2(FALSE)
Spherical geometry (s2) switched off

Now import the shapefile using the st_read() function and plot the first column.

# US states feature
states <- st_read("../data/states.shp")
Reading layer `states' from data source 
  `/home/randre/Documents/Code/CASCADIA_R_Intro_to_GIS_2024/data/states.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 52 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  WGS 84
# Plot the first data column
plot(states[1])


Question 2: What is the class and geometry type of states? Describe two other differences between states and lilacModel_r.

Answer: the class is both sf and data.frame. The sf portion contains geographic information such as geometry, extent, and CRS, whereas the data frame has associated data, such as state names and abbreviations.


We’re going to run into issues due to differing coordinate reference systems (CRS) of states and lilacModel_r. You can see these differences by using the crs() function on both objects.

# CRS of states
crs(states)
[1] "GEOGCRS[\"WGS 84\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n            LENGTHUNIT[\"metre\",1]]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433]],\n    CS[ellipsoidal,2],\n        AXIS[\"geodetic latitude (Lat)\",north,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        AXIS[\"geodetic longitude (Lon)\",east,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n    ID[\"EPSG\",4326]]"

CRS of raster data.

# CRS of lilacModel_r
crs(lilacModel_r)
[1] "GEOGCRS[\"NAD83\",\n    DATUM[\"North American Datum 1983\",\n        ELLIPSOID[\"GRS 1980\",6378137,298.257222101004,\n            LENGTHUNIT[\"metre\",1]]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433]],\n    CS[ellipsoidal,2],\n        AXIS[\"geodetic latitude (Lat)\",north,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        AXIS[\"geodetic longitude (Lon)\",east,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n    ID[\"EPSG\",4269]]"

Change the CRS of states to match the raster data using st_transform. Check out the results.

# Transform coordinates of states
states <- st_transform(states, crs(lilacModel_r))
crs(states)
[1] "GEOGCRS[\"NAD83\",\n    DATUM[\"North American Datum 1983\",\n        ELLIPSOID[\"GRS 1980\",6378137,298.257222101004,\n            LENGTHUNIT[\"metre\",1]]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433]],\n    CS[ellipsoidal,2],\n        AXIS[\"geodetic latitude (Lat)\",north,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        AXIS[\"geodetic longitude (Lon)\",east,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n    ID[\"EPSG\",4269]]"

Next, use st_crop() to crop your states feature to the same extent as your raster, which is the contiguous U.S. Check out the results. What happens if you skip the st_transform() step and run this code?

# Crop to this extent
states_c <- st_crop(states, lilacModel_r)
although coordinates are longitude/latitude, st_intersection assumes that they
are planar
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
# Plot the result
plot(states_c[1])


Challenge 2: Can you think of a dplyr approach to obtaining data only for CONUS? Hint: remove rows with undesired states (note that Puerto Rico [PR] is in the data).

states_c2 <- filter(states, !stusps %in% c("AK", "HI", "PR"))
plot(states_c2[1])


Finally, use geom_sf() to add an sf object to a plot - i.e., the states feature.

# Add an 'sf' object to them map
ggplot() + 
  geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
  geom_sf(data = states_c, fill = NA) +
  scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal) +
  ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") +
  my_theme


Question 3: What happens when you don’t enter NA for the fill option for the states_c feature? Why?

Answer: the states features covers up the raster data because it’s multi-polygon data. The default fill color is gray.


You could just continue to use fill = NA, or convert the geometry of states_c to MULTILINESTRING using the st_cast() function.

# Convert geometry from polygon to multilinestring 
states_c %>%   
  st_cast("MULTILINESTRING")  
Simple feature collection with 49 features and 12 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: -124.7258 ymin: 24.49813 xmax: -66.9499 ymax: 49.38436
Geodetic CRS:  NAD83
First 10 features:
   statefp  statens     affgeod geoid stusps       name lsad        aland
1       06 01779778 0400000US06    06     CA California   00 403671196038
2       55 01779806 0400000US55    55     WI  Wisconsin   00 140292246684
3       16 01779783 0400000US16    16     ID      Idaho   00 214049923496
4       27 00662849 0400000US27    27     MN  Minnesota   00 206232157570
5       19 01779785 0400000US19    19     IA       Iowa   00 144659688848
6       29 01779791 0400000US29    29     MO   Missouri   00 178052563675
7       24 01714934 0400000US24    24     MD   Maryland   00  25151895765
8       41 01155107 0400000US41    41     OR     Oregon   00 248628426864
9       26 01779789 0400000US26    26     MI   Michigan   00 146614604273
10      30 00767982 0400000US30    30     MT    Montana   00 376973673895
         awater    stat_nm stt_bbr jrsdct_                       geometry
1   20294133830 California      CA   state MULTILINESTRING ((-118.594 ...
2   29343721650  Wisconsin      WI   state MULTILINESTRING ((-86.93428...
3    2391577745      Idaho      ID   state MULTILINESTRING ((-117.243 ...
4   18949864226  Minnesota      MN   state MULTILINESTRING ((-97.22904...
5    1085996889       Iowa      IA   state MULTILINESTRING ((-96.62187...
6    2487215790   Missouri      MO   state MULTILINESTRING ((-95.76564...
7    6979171386   Maryland      MD   state MULTILINESTRING ((-76.04622...
8    6170953359     Oregon      OR   state MULTILINESTRING ((-124.5525...
9  103872203398   Michigan      MI   state MULTILINESTRING ((-84.61622...
10   3866689601    Montana      MT   state MULTILINESTRING ((-116.0492...

You can add labels to your map using the geom_sf_text() or geom_sf_label() functions.

# Add an 'sf' object to them map
ggplot() + 
  geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
  geom_sf(data = states_c, fill = NA) +
  geom_sf_text(data = states_c, aes(label = stusps), size = 2) +
  scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal) +
  ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") +
  my_theme

ggplot2: geom_raster()

An identical map can be produced using the geom_spatraster() function in tidyterra, which works directly with SpatRaster objects. First, create a copy of your raster and convert the data into categorical (factor) format for plotting. The categories_df data frame is used to define factor levels.

# Define factor levels of the raster using the "catgories_df" data frame
lilacModel_r2 <- lilacModel_r
levels(lilacModel_r2) <- catgories_df

Look at both rasters and notice the data formats.

# Original raster (numeric data - days of the year)
lilacModel_r
class       : SpatRaster 
dimensions  : 1228, 2606, 1  (nrow, ncol, nlyr)
resolution  : 0.02246419, 0.02107085  (x, y)
extent      : -125.0208, -66.47917, 24.0625, 49.9375  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 (EPSG:4269) 
source      : lilac_2018_model.tiff 
name        : layer 
min value   :     6 
max value   :   223 
# New raster (categorical data - months of the year)
lilacModel_r2
class       : SpatRaster 
dimensions  : 1228, 2606, 1  (nrow, ncol, nlyr)
resolution  : 0.02246419, 0.02107085  (x, y)
extent      : -125.0208, -66.47917, 24.0625, 49.9375  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 (EPSG:4269) 
source      : lilac_2018_model.tiff 
categories  : leaf_out_month 
name        : leaf_out_month 
min value   :            Jan 
max value   :            Aug 

Create the same map as before but instead use geom_spatraster() with your categorical raster instead of geom_raster().

# Plot produced using "geom_spatraster"
conus_mod <- ggplot() +
  geom_spatraster(data = lilacModel_r2, aes(fill = leaf_out_month))  +
  geom_sf(data = states_c, fill = NA) +
  geom_sf_text(data = states_c, aes(label = stusps), size = 2) +
  scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal, na.value = "white") +
  ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") +
  my_theme

conus_mod

Save your map using ggsave() and see how it looks. If you’re not happy with it, adjust the image dimensions of change the plot theme (i.e., my_theme).

ggsave(conus_mod, filename = "../plots/Lilac_leaf_out_model_2018.png", 
       dpi = 300, units = c('in'), width = 10, height = 6)

Exercise 2: Adding phenometric data to the map

We can add more layers to our map to ask certain questions, such as: How do model-predicted dates of lilac leaf out compared with observed dates? Below, we import phenometric data for observed lilac leaf out dates for 2018. These data were collected by volunteers and submitted to the USA National Phenology Network’s database

# Phenometric data
lilacObs_df <- read.csv("../data/lilac_2018_obs.csv")

Take a look at the data dimensions. How many observations are there?

dim(lilacObs_df)
[1] 182  25

There are 77 unique sites in the dataset. The goal is to add the location for each observation to our previous map.

length(unique(lilacObs_df$site_id))
[1] 77

We only need data in three columns in the phenometric data: longitude, latitude, and first_yes_doy. The first_yes_doy column specifies which day of year that lilacs first had leaf out at the site. We’ll subset the columns using select() and take a look at the first several rows of data.

lilacObs_df <- lilacObs_df %>%
  filter(state != "ON") %>% # Remove an observation from Canada
  dplyr::select("longitude", "latitude", "first_yes_doy") %>%
  rename("leaf_out_doy" = "first_yes_doy") %>% 
  left_join(catgories_df, by = "leaf_out_doy") 

Take a look at the first several rows of data.

head(lilacObs_df)
  longitude latitude leaf_out_doy leaf_out_month
1 -87.89300 43.09780          121            May
2 -87.89300 43.09780          121            May
3 -92.75200 36.52450           53            Feb
4 -83.05769 43.72269          118            Apr
5 -76.55337 38.88813           57            Feb
6 -76.55337 38.88813           57            Feb

For plotting purposes, we convert the phenometric data to a simple feature (sf) object using the st_as_sf() function. You must specify which columns contain the coordinates, and also define the CRS, which should be the same as our other objects.

# Convert data frame to a simple feature
lilacObs_sf <- lilacObs_df %>%
   st_as_sf(coords = c("longitude", "latitude"), crs = crs(states))

Compare the geometry of lilacObs_sf and states.

# Lilac observations feature
lilacObs_sf$geometry
Geometry set for 181 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.7566 ymin: 35.01538 xmax: -70.89148 ymax: 47.95269
Geodetic CRS:  NAD83
First 5 geometries:
POINT (-87.893 43.0978)
POINT (-87.893 43.0978)
POINT (-92.752 36.5245)
POINT (-83.05769 43.72269)
POINT (-76.55337 38.88813)
# States feature
states$geometry
Geometry set for 52 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83
First 5 geometries:
MULTIPOLYGON (((-118.594 33.4672, -118.4848 33....
MULTIPOLYGON (((-86.93428 45.42115, -86.83575 4...
MULTIPOLYGON (((-117.243 44.39097, -117.2151 44...
MULTIPOLYGON (((-97.22904 49.00069, -96.93096 4...
MULTIPOLYGON (((-96.62187 42.77925, -96.57794 4...

Question 4: How does the geometry of the lilac observations (lilacObs_sf) differ from the states feature? Why?

Answer: the lilac observations have a POINT geometry whereas the states data have a MULTIPOLYGON geometry. Observations are from single locations, whereas state boundaries cover many points over the U.S.


Now let’s add the phenometric data so we can visualize spatial concordance between model-predicted and observed dates for first leaf out. Sites for each observation are plotted using their coordinate information and colored according to values in the leaf_out_date column.

# Add sites to the map
conus_modVobs <- ggplot() + 
  #geom_spatraster(data = lilacModel_r, aes(fill = leaf_out_month))  +
  geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
  geom_sf(data = states_c, fill=NA) +
  geom_sf(data = lilacObs_sf, aes(color = leaf_out_month), size = 3, show.legend = FALSE) +
  geom_sf(data = lilacObs_sf, color = "black", size = 3, shape = 1) + # black outlines around sites
  scale_fill_manual(name = "Month of First\nLeaf Out", label = labs, values = pal, drop = FALSE) +
  scale_color_manual(name = "Month of First\nLeaf Out", label = labs, values = pal, drop = FALSE) +
  ggtitle("Predicted vs. Observed 1st Leaf Out in Lilac in 2018") +
  my_theme
conus_modVobs

Save your map using ggsave() and see how it looks. If you’re not happy with it, adjust the image dimensions of change the plot theme (i.e., my_theme).

ggsave(conus_modVobs, filename = "../plots/Lilac_leaf_out_modelVsObs_2018.png", 
       dpi = 300, units = c('in'), width = 10, height = 6)

Question 5: What does your map tell you about predicted vs. observed first leaf out in lilac? Can you think of a better way to compare the two datasets on a map?

Answer: Our map indicates that predicted dates of lilac leaf out in 2018 did not always correspond to observed dates. In some cases, the model seems to under-predict dates (leaf out too early) whereas in others it over-predicts (leaf out too late). Notice that two sites had dates that were much later than predicted dates. In this case, perhaps the volunteers failed to notice that lilac leaf out had already occurred, although other explanations are possible.


Combining different maps

Let’s say we’re interested in results for a particular region of CONUS, such as a state. The map can be zoomed with the coord_sf() function, in which the x- and y-limits of the region are defined. In the commented coord_sf() function below, enter your own limits (xmin, xmax, ymin, ymax), but make sure they’re within the U.S.

# Map for specific region
# The expand = FALSE ensures that limits are taken exactly as provided
conus_modVobs +
  # Define x- and y-limits
    coord_sf(xlim = c(-90.5542, -82.3047), 
             ylim = c(41.6311, 47.5739),
             expand = FALSE) 

We can also add county boundaries by importing the shapefile in the data folder (./data/counties.shp). Remember to change the CRS as we did above for states. Also, feel free to convert the geometry to MULTILINESTRING.

# Simple feature ("sf") for US counties
counties <- st_read("../data/counties.shp")
Reading layer `counties' from data source 
  `/home/randre/Documents/Code/CASCADIA_R_Intro_to_GIS_2024/data/counties.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 3234 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
Geodetic CRS:  NAD83
counties <- counties %>%
  st_transform(crs = crs(states)) %>%
  st_cast("MULTILINESTRING") 

Add the counties to your map below, remove the legend and title, and make any other desired changes. Give the map a name so it can be used for the next step.

# Map for region of interest
# Lilac leaf out observations are added again so boundary lines don't cover them
zoom_modVobs <- conus_modVobs +
  geom_sf(data = counties, fill = NA, color = "gray30", size = 0.25) + # Add counties
  geom_sf(data = lilacObs_sf, aes(color = leaf_out_month), size = 3, show.legend = FALSE) +
  coord_sf(xlim = c(-90.5542, -82.3047), 
           ylim = c(41.6311, 47.5739),
           expand = FALSE) + # Take limits exactly as defined
  theme(legend.position = "none", 
        plot.title = element_blank())
zoom_modVobs

Finally, we can combine maps for CONUS and your region of interest using the patchwork package. The \ operator indicates that the map for CONUS goes on top of the region map, the width setting indicates that the CONUS is map is 2X bigger, and the legend is “collected” to ensure it stays in the correct position (on right-hand side).

# Combine maps using patchwork (ToDo - this still doesn't run [Roger])
both_modVobs <- conus_modVobs / zoom_modVobs +
  plot_layout(guides = "collect", widths = c(2, 1))
both_modVobs

Save your map and see what you think. Feel free to adjust the theme, plot dimensions, or anything else to improve the map.

ggsave(both_modVobs, filename = "../plots/Lilac_leaf_out_modelVsObs_2018_2.png", 
       dpi = 300, units = c('in'), width = 9, height = 9)

Note: Other options for combining plots include functions in the ggpubr [ggarrange()] and cowplot [plot_grid()] packages.

Exercise 3: Extract and analyze spatial data

There are numerous functions in terra for extracting, manipulating, and analyzing raster data. Here, we’ll use the extract function to extract model predictions (predicted day of year) from the precise location where lilac observations (observed day of year) were collected. This allows us to analyze rather than just visualize the two datasets.

Extract the predictions, convert results to a data frame, and rename the data column from layer to leaf_out_doy.

# Extract model predictions for each observation location
lilacObs_extr <- extract(lilacModel_r, lilacObs_sf) %>%
  as.data.frame() %>%
  rename("leaf_out_doy" = "layer") %>%
  as.data.frame() # 

Next, we can estimate the extent to which the model under-predicts vs. over-predicts the date of lilac leaf out by subtracting the leaf_out_doy columns.

differences <- lilacObs_extr$leaf_out_doy - lilacObs_sf$leaf_out_doy 

Create a histogram of differences using the hist() function in base R.

hist(differences, breaks = seq(-250, 50, 25))

Now calculate some simple summary statistics of the differences.

# Mean
mean(differences)
[1] -13.71271
# Range
range(differences)
[1] -237   48
# Median
median(differences)
[1] 1

Question 6: What information do your simple analyses provide that can’t be seen on a map? Overall, what do the results say about the performance of the model for lilac leaf out?

Answer: On average, predicted dates were 13.71 days earlier (range = -237 to 48 days) than observed dates, which suggests that the model is under-predicting the date of lilac leaf out. However, the extreme outliers are likely due to observer error, as we noted above. The model actually seems to perform pretty well if we reduce the impact of outliers by calculating the median difference, which is only 1 day.


Wrap-up

These exercises only scratched the surface of mapping in R! Here’s a summary:

- terra is the most important R package for working with rasters

  • plot() is useful for quickly plotting rasters, but I prefer ggplot2 functions for “final” mapping

  • Contains numerous functions for extracting, manipulating, and analyzing raster data

  • Has methods for manipulating data in vector form

- ggplot2 has several functions for plotting rasters including:

  • geom_raster() or geom_tile(): require a data frame
  • geom_spatraster(): requires a SpatRaster object
  • geom_stars(): uses the stars package, which was not covered here

Potentially useful tutorials, free books, etc.

Compilation of various resources
- R Spatial data science blogs

Books
- Geocomputation with R by R. Lovelace et al. (2022)
- Geographic Data Science with R by M.C. Wimberly (2023)
- A Crash Course in GIS using R by M. Branion-Calles (2021)
- Data Analysis and Visualization with R: Spatial by S. Sun (2023)

Tutorials/vignettes
- Spatial Data Science with R and “terra” by R. Hijmans (2019-2023)
- Geospatial Data Science in R by Zia Ahmed
- Introduction to GIS with R by Jesse Sadler
- GIS and Spatial Analysis with R by Manny Gimmond
- Vignette for sf
- R as GIS for Economists by Taro Mieno

Acknowledgements

Parts of these exercises were modified from a tutorial for the rnpn package created by Alyssa Rosemartin. Raster and observation data used for the demo were downloaded from the USA National Phenology Network’s servers using rnpn. Thanks to Roger Andre for providing helpful feedback on earlier versions of this document.